334 research outputs found

    Italian Event Detection Goes Deep Learning

    Get PDF
    This paper reports on a set of experiments with different word embeddings to initialize a state-of-the-art Bi-LSTM-CRF network for event detection and classification in Italian, following the EVENTI evaluation exercise. The net- work obtains a new state-of-the-art result by improving the F1 score for detection of 1.3 points, and of 6.5 points for classification, by using a single step approach. The results also provide further evidence that embeddings have a major impact on the performance of such architectures.Comment: to appear at CLiC-it 201

    H2_2 ortho-to-para conversion on grains: A route to fast deuterium fractionation in dense cloud cores?

    Get PDF
    Deuterium fractionation, i.e. the enhancement of deuterated species with respect to the non-deuterated ones, is considered to be a reliable chemical clock of star-forming regions. This process is strongly affected by the ortho-to-para (o-p) H2_2 ratio. In this letter we explore the effect of the o-p H2_2 conversion on grains on the deuteration timescale in fully depleted dense cores, including the most relevant uncertainties that affect this complex process. We show that (i) the o-p H2_2 conversion on grains is not strongly influenced by the uncertainties on the conversion time and the sticking coefficient and (ii) that the process is controlled by the temperature and the residence time of ortho-H2_2 on the surface, i.e. by the binding energy. We find that for binding energies in between 330-550 K, depending on the temperature, the o-p H2_2 conversion on grains can shorten the deuterium fractionation timescale by orders of magnitude, opening a new route to explain the large observed deuteration fraction DfracD_\mathrm{frac} in dense molecular cloud cores. Our results suggest that the star formation timescale, when estimated through the timescale to reach the observed deuteration fractions, might be shorter than previously proposed. However, more accurate measurements of the binding energy are needed to better assess the overall role of this process.Comment: Accepted for publication in ApJ Letter

    ProTestA:Identifying and Extracting Protest Events in News Notebook for ProtestNews Lab at CLEF 2019

    Get PDF
    This notebook describes our participation to the Protest- New Lab, identifying protest events in news articles in English. Systems are challenged to perform unsupervised domain adaptation against three sub-tasks: document classification, sentence classification, and event ex- traction. We describe the final submitted systems for all sub-tasks, as well as a series of negative results. Results indicate pretty robust perfor- mances in all tasks (average F1 of 0.705 for the document classification sub-task, average F1 of 0.592 for the sentence classification sub-task; av- erage F1 0.528 for the event extraction sub-task), ranking in the top 4 systems, although drops in the out-of-domain test sets are not minimal

    Time, events and temporal relations: an empirical model for temporal processing of Italian texts

    Get PDF
    The aim of this work is the elaboration a computational model for the identification of temporal relations in text/discourse to be used as a component in more complex systems for Open-Domain Question-Answers, Information Extraction and Summarization. More specifically, the thesis will concentrate on the relationships between the various elements which signal temporal relations in Italian texts/discourses, on their roles and how they can be exploited. Time is a pervasive element of human life. It is the primary element thanks to which we are able to observe, describe and reason about what surrounds us and the world. The absence of a correct identification of the temporal ordering of what is narrated and/or described may result in a bad comprehension, which can lead to a misunderstanding. Normally, texts/discourses present situations standing in a particular temporal ordering. Whether these situations precede, or overlap or are included one within the other is inferred during the general process of reading and understanding. Nevertheless, to perform this seemingly easy task, we are taking into account a set of complex information involving different linguistic entities and sources of knowledge. A wide variety of devices is used in natural languages to convey temporal information. Verb tense, temporal prepositions, subordinate conjunctions, adjectival phrases are some of the most obvious. Nevertheless even these obvious devices have different degrees of temporal transparency, which may sometimes be not so obvious as it can appear at a quick and superficial analysis. One of the main shortcomings of previous research on temporal relations is represented by the fact that they concentrated only on a particular discourse segment, namely narrative discourse, disregarding the fact that a text/discourse is composed by different types of discourse segments and relations. A good theory or framework for temporal analysis must take into account all of them. In this work, we have concentrated on the elaboration of a framework which could be applied to all text/discourse segments, without paying too much attention to their type, since we claim that temporal relations can be recovered in every kind of discourse segments and not only in narrative ones. The model we propose is obtained by mixing together theoretical assumptions and empirical data, collected by means of two tests submitted to a total of 35 subjects with different backgrounds. The main results we have obtained from these empirical studies are: (i.) a general evaluation of the difficulty of the task of recovering temporal relations; (ii.) information on the level of granularity of temporal relations; (iii.) a saliency-based order of application of the linguistic devices used to express the temporal relations between two eventualities; (iv.) the proposal of tense temporal polysemy, as a device to identify the set of preferences which can assign unique values to possibly multiple temporal relations. On the basis of the empirical data, we propose to enlarge the set of classical finely grained interval relations (Allen, 1983) by including also coarse-grained temporal relations (Freska, 1992). Moreover, there could be cases in which we are not able to state in a reliable way if there exists a temporal relation or what the particular relation between two entities is. To overcome this issue we have adopted the proposal by Mani (2007) which allows the system to have differentiated levels of temporal representation on the basis of the temporal granularity associated with each discourse segment. The lack of an annotated corpus for eventualities, temporal expressions and temporal relations in Italian represents the biggest shortcomings of this work which has prevented the implementation of the model and its evaluation. Nevertheless, we have been able to conduct a series of experiments for the validation of procedures for the further realization of a working prototype. In addition to this, we have been able to implement and validate a working prototype for the spotting of temporal expressions in texts/discourses

    Identifying communicative functions in discourse with content types

    Get PDF
    Texts are not monolithic entities but rather coherent collections of micro illocutionary acts which help to convey a unitary message of content and purpose. Identifying such text segments is challenging because they require a fine-grained level of analysis even within a single sentence. At the same time, accessing them facilitates the analysis of the communicative functions of a text as well as the identification of relevant information. We propose an empirical framework for modelling micro illocutionary acts at clause level, that we call content types, grounded on linguistic theories of text types, in particular on the framework proposed by Werlich in 1976. We make available a newly annotated corpus of 279 documents (for a total of more than 180,000 tokens) belonging to different genres and temporal periods, based on a dedicated annotation scheme. We obtain an average Cohen’s kappa of 0.89 at token level. We achieve an average F1 score of 74.99% on the automatic classification of content types using a bi-LSTM model. Similar results are obtained on contemporary and historical documents, while performances on genres are more varied. This work promotes a discourse-oriented approach to information extraction and cross-fertilisation across disciplines through a computationally-aided linguistic analysis
    • …
    corecore